Translating the InChI: adapting neural machine translation to predict IUPAC names from a chemical identifier
نویسندگان
چکیده
Abstract We present a sequence-to-sequence machine learning model for predicting the IUPAC name of chemical from its standard International Chemical Identifier (InChI). The uses two stacks transformers in an encoder-decoder architecture, setup similar to neural networks used state-of-the-art translation. Unlike translation, which usually tokenizes input and output into words or sub-words, our processes InChI predicts character by character. was trained on dataset 10 million InChI/IUPAC pairs freely downloaded National Library Medicine’s online PubChem service. Training took seven days Tesla K80 GPU, achieved test set accuracy 91%. performed particularly well organics, with exception macrocycles, comparable commercial generation software. predictions were less accurate inorganic organometallic compounds. This can be explained inherent limitations representing inorganics, as low coverage training data.
منابع مشابه
InChI, the IUPAC International Chemical Identifier
This paper documents the design, layout and algorithms of the IUPAC International Chemical Identifier, InChI.
متن کاملInChI - the worldwide chemical structure identifier standard
Since its public introduction in 2005 the IUPAC InChI chemical structure identifier standard has become the international, worldwide standard for defined chemical structures. This article will describe the extensive use and dissemination of the InChI and InChIKey structure representations by and for the world-wide chemistry community, the chemical information community, and major publishers and...
متن کاملDetection of IUPAC and IUPAC-like chemical names
MOTIVATION Chemical compounds like small signal molecules or other biological active chemical substances are an important entity class in life science publications and patents. Several representations and nomenclatures for chemicals like SMILES, InChI, IUPAC or trivial names exist. Only SMILES and InChI names allow a direct structure search, but in biomedical texts trivial names and Iupac like ...
متن کاملTranslating Phrases in Neural Machine Translation
Phrases play an important role in natural language understanding and machine translation (Sag et al., 2002; Villavicencio et al., 2005). However, it is difficult to integrate them into current neural machine translation (NMT) which reads and generates sentences word by word. In this work, we propose a method to translate phrases in NMT by integrating a phrase memory storing target phrases from ...
متن کاملconstructing a test to predict the translation performance of english translation ma graduates on legal correspondence and deeds as a profession
regarding the ever evolving and improving world on different aspects of knowledge, the need to a worldwide communication would emerge stronger than ever before which calls for special attention on the judgments and best choices for intermediating between the nations. as the language skills for translation are tested separately from translation skills themselves, to assess translation skills pro...
ذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Journal of Cheminformatics
سال: 2021
ISSN: ['1758-2946']
DOI: https://doi.org/10.1186/s13321-021-00535-x